Scalable Blas 2 and 3 Matrix Multiplication for Sparse Banded Matrices on Distributed Memory Mimd Machines

نویسندگان

Nikos Chrisochoides

Mokhtar Aboelaze

Elias Houstis

چکیده

In this paper, we present two algorithms for sparse banded matrix-vector and sparse banded matrix-matrix product operations on distributed memory multiprocessor systems that support a mesh and ring interconnection topology. We aslo study the scalability of these two algorithms. We employ systolic type techniques to eliminate synchronization delay and minimize the communication overhead among processors. The performance of algorithms developed for the above operations depends on the bandwidth of the matrices involved and have been currently implemented on the NCUBE II with 64 processors. Our preliminary experimental data agree with the expected theoretical behavior.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PB-BLAS: a set of parallel block basic linear algebra subprograms

We propose a new software package which would be very useful for implementing dense linear algebra algorithms on block-partitioned matrices. The routines are referred to as block basic linear algebra subprograms (BLAS), and their use is restricted to computations in which one or more of the matrices involved consists of a single row or column of blocks, and in which no more than one of the matr...

متن کامل

A Parallel Computational Kernel for Sparse Nonsymmetric Eigenvalue Problems on Multicomputers

The aim of this paper is to show an effective reorganization of the nonsymmetric block lanczos algorithm efficient, portable and scalable for multiple instructions multiple data (MIMD) distributed memory message passing architectures. Basic operations implemented here are matrix-matrix multiplications, eventually with a transposed and a sparse factor, LU factorisation and triangular systems sol...

متن کامل

Techniques for Parallel Manipulation of Sparse Matrices

New techniques are presented forthe manipulation of sparse matrices on parallel MIMD computers. We consider the following problems: matrix addition, matrix multiplication, row and column permutation, matrix transpose, matrix vector multiplication, and Gaussian elimination.

متن کامل

Elimination Forest Guided D Sparse LU Factorization

Sparse LU factorization with partial pivoting is important for many scienti c applications and delivering high perfor mance for this problem is di cult on distributed memory machines Our previous work has developed an approach called S that incorporates static symbolic factorization supernode partitioning and graph scheduling This paper studies the properties of elimination forests and uses the...

متن کامل

A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers

We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA1 (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectivel...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Scalable Blas 2 and 3 Matrix Multiplication for Sparse Banded Matrices on Distributed Memory Mimd Machines

نویسندگان

چکیده

منابع مشابه

PB-BLAS: a set of parallel block basic linear algebra subprograms

A Parallel Computational Kernel for Sparse Nonsymmetric Eigenvalue Problems on Multicomputers

Techniques for Parallel Manipulation of Sparse Matrices

Elimination Forest Guided D Sparse LU Factorization

A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers

عنوان ژورنال:

اشتراک گذاری